AITopics | atari game

20e6b4dd2b1f82bc599c593882f67f75-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 19:40:32 GMT

artificial intelligence, foveal observation size, machine learning, (15 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games > Computer Games (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

To facilitate the following derivation, we rewrite the objective J E+I(E+I) JE(E): 438 J E+I(E+I) JE(E) = E E+ I h 1X

Neural Information Processing SystemsApr-25-2026, 01:03:39 GMT

A.1 Full derivation425 We present the complete derivation of the objective function in each subproblem defined in Section426 3.2. For brevity, let rt =(1+)rEt +rIt and V EE (st)= Vt. Under this assumption, E serves as 0 (see above). This451 enables updating E+I using the local approximation. We leave relaxing this assumption as future452 work.453

artificial intelligence, machine learning, objective, (19 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (0.30)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

114292cf3f930ba157ed33f66997fee2-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 15:48:33 GMT

artificial intelligence, machine learning, policy change, (16 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.97)

Add feedback

_NeurIPS_2022__On_the_Effectiveness_of_Fine_tuning_Versus_Meta_reinforcement_Learning (1)

Mandi Zhao

Neural Information Processing SystemsFeb-19-2026, 09:45:44 GMT

Do the main claims made in the abstract and introduction accurately reflect the paper's contributions and If you ran experiments... (a) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? Please refer to both main text and appendix for experiment details. Did you report error bars (e.g., with respect to the random seed after running experiments multiple All adaptation experiments in Procgen and RLBench are run for 3 seeds. Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal As stated in section 2, we use RTX A5000 GPUs each with 24GB memory. C2F-ARM algorithm and training framework are built based on the original author's implementation Did you mention the license of the assets?

artificial intelligence, experiment, machine learning, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

0b96d81f0494fde5428c7aea243c9157-Supplemental.pdf

Neural Information Processing SystemsFeb-18-2026, 22:30:24 GMT

component description observation state index, maximum step, tabular sgd, (11 more...)

Neural Information Processing Systems

Country: North America > Canada (0.04)

Industry: Leisure & Entertainment (0.32)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

Resetting the Optimizer in Deep RL: An Empirical Study

Neural Information Processing SystemsFeb-17-2026, 16:08:22 GMT

We focus on the task of approximating the optimal value function in deep reinforcement learning. This iterative process is comprised of solving a sequence of optimization problems where the loss function changes per iteration. The common approach to solving this sequence of problems is to employ modern variants of the stochastic gradient descent algorithm such as Adam. These optimizers maintain their own internal parameters such as estimates of the first-order and the second-order moments of the gradient, and update them over time. Therefore, information obtained in previous iterations is used to solve the optimization problem in the current iteration. We demonstrate that this can contaminate the moment estimates because the optimization landscape can change arbitrarily from one iteration to the next one. To hedge against this negative effect, a simple idea is to reset the internal parameters of the optimizer when starting a new iteration. We empirically investigate this resetting idea by employing various optimizers in conjunction with the Rainbow algorithm. We demonstrate that this simple modification significantly improves the performance of deep RL on the Atari benchmark.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: